{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a0d0a1c8",
   "metadata": {},
   "source": [
    "# Document Question Answering\n",
    "\n",
    "An example of using Chroma DB and LangChain to do question answering over documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "652985d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.vectorstores import Chroma\n",
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain.llms import OpenAI\n",
    "from langchain.chains import VectorDBQA\n",
    "from langchain.document_loaders import TextLoader"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01fe5351",
   "metadata": {},
   "source": [
    "## Load documents\n",
    "\n",
    "Load documents to do question answering over. If you want to do this over your documents, this is the section you should replace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5b352847",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = TextLoader('state_of_the_union.txt')\n",
    "documents = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "478be861",
   "metadata": {},
   "source": [
    "## Split documents\n",
    "\n",
    "Split documents into small chunks. This is so we can find the most relevant chunks for a query and pass only those into the LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7dd1adb2",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "texts = text_splitter.split_documents(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9b30aff",
   "metadata": {},
   "source": [
    "## Initialize ChromaDB\n",
    "\n",
    "Create embeddings for each chunk and insert into the Chroma vector database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c0d2a049",
   "metadata": {},
   "outputs": [],
   "source": [
    "embeddings = OpenAIEmbeddings()\n",
    "vectordb = Chroma.from_documents(texts, embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ddb7866",
   "metadata": {},
   "source": [
    "## Create the chain\n",
    "\n",
    "Initialize the chain we will use for question answering."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "35164427",
   "metadata": {},
   "outputs": [],
   "source": [
    "qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", vectorstore=vectordb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cff96efe",
   "metadata": {},
   "source": [
    "## Ask questions!\n",
    "\n",
    "Now we can use the chain to ask questions!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f5851c0b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. He also said that she is a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He described her as a consensus builder.\""
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "qa.run(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24cf9016",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
